Leyang Feng
June 12, 2019
A workshop covering reproducibility; version control; basic git workflows.
We are in the era of 'big data', but even if you work with 'little data' you have to acquire some skills to deal with those data.
Most fundamentally, your results have to be reproducible.
Your most important collaborator is your future self. It’s important to make a workflow that you can use time and time again, and even pass on to others in such a way that you don’t have to be there to walk them through it. Source
Reproducibility means scripts or programs tied to open source software.
…what doesn't exist.
…what you've lost. What if you need access to a file as it existed 1, 10, or 100, or 1000 days ago?
Git (and website GitHub) are the most popular version control tools for use with R, and many other languages:
file operations in your folder (create/change/delete)
just remember **git is watching you**
commit then changes so that the git can create current snapshot of your folder
Using Git, now you have version control, but you still have:
Or, your want to:
pull requests…
A typical project/paper directory for me:
1-download.R
2-process_data.R
3-analyze_data.R
4-make_graphs.R
logs/
output/
rawdata/
This directory is backed up both locally and remotely, and is under version control, so it's easy to track changes over time.
0-download.R, 1-process_data.R, …)